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[57] ABSTRACT 

A method and apparatus for loading memory within a 
reconfigurable programmable logic device including config- 
uring the device as a RAM loader circuit, loading the RAM 
with data and then reconfiguring the device with a circuit 
utilizing the loaded RAM. The inventive method and appa- 
ratus allow use of the RAM as high density functional 
centers of the desired design immediately upon initialization 
of the circuit, without wasting valuable time or FPGA 
resources on a static, non-flexible RAM loader structure. 

15 Claims, 4 Drawing Sheets 
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STRUCTURE AND METHOD FOR LOADING 
RAM DATA WITHIN A PROGRAMMABLE 
LOGIC DEVICE 

BACKGROUND OF THE INVENTION 

1. Held of the Invention 

The present invention relates to programmable devices, 
and more particularly to programmable devices having 
dedicated, on- board random access memory (RAM) and a 
method for loading data into the RAM. 

2. Description of the Related Art 

Digital logic can be implemented using several options: 
discrete logic devices, often called small-scale integrated 
(SSI) circuits, programmable devices such as programmable 
logic arrays (PLAs), masked-programmed gate arrays or 
cell-based application specific integrated circuits (ASICs), 
and field programmable gate arrays (FPGAs). 

FPGAs are general purpose programmable devices that 
are customized by the end users. FPGAs are composed of an 
array of configurable logic blocks that are programmably 
interconnected, usually at power-up. The basic device archi- 
tecture of an FPGA consists of an array of configurable logic 
blocks (CLBs) embedded in a configurable interconnect 
structure and surrounded by configurable I/O blocks (IOBs). 
An IOB allows signals to be driven off-chip or optionally 
brought on to the FPGA interconnect segments. The IOB can 
typically perform other functions, such as tri- stating output 
signals and registering incoming or outgoing signals. The 
configurable interconnect structure allows users to imple- 
ment multi-level logic designs, wherein the output signal of 
one logic unit is provided as an input signal to another logic 
unit and the output signal of that logic unit is provided as an 
input signal to another logic unit, and so on. 

Each configurable logic block in an FPGA typically 
includes configuration memory cells for controlling the 
function performed by that logic block. These configuration 
memory cells can implement lookup tables and control 
multiplexers and other logic elements within a CLB, such as 
XOR gates and AND gates. Lookup tables implement the 
combinational logic function corresponding to the truth table 
stored in the configuration memory cell. 

An FPGA can support tens of thousands of gates of logic 
operating at system speeds of tens of megahertz. The FPGA 
is programmed by loading programming data into the 
memory cells controlling the configurable logic blocks, I/O 
blocks, and interconnect structure. Further information 
about FPGAs and programming protocol appears at pages 
2-7 to 2-46 of the *The Programmable Logic Data Book." 
incorporated herein by reference. Copyright 1994 by Xilinx, 
Inc., the assignee of this invention, and available from 
Xilinx, Inc. at 2100 Logic Drive, San Jose, Calif. 95124. 

Configuring an FPGA includes loading a bitstream con- 
taining the desired design program data for the CLBs, IOBs, 
and the configurable interconnect into a plurality of con- 
figuration memory cells on the FPGA. The bitstream is 
typically loaded into the FPGA serially to minimi?* the 
number of pins required for configuration and to reduce the 
complexity of the interface to external memory, although 
serial data can be converted to parallel form for increased 
speed. The configuration data, once loaded into the configu- 
ration memory cells, dictates the functions performed by the 
CLBs, IOBs, and configurable interconnect within the 
FPGA. 

Recently, there has been a dramatic increase in the com- 
plexity and size of logic circuits used in a variety of 
applications. 
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Since the number of CLBs that can be fabricated on a 
single integrated circuit chip is limited, the increasing num- 
ber of elements in desired logic circuits often cannot be 
implemented within a single FPGA. Thus, there is a need to 
5 improve the efficiency and functionality of FPGAs and 
FPGA-implemented logic functions. This efficiency and 
functionality improvement can be achieved by increasing 
the amount of memory available on a part through configu- 
ration of lookup tables as Random Access Memory (RAM) 

10 blocks, thereby increasing the capacity of the part (FPGA 
capacity is often measured in terms of equivalent logic gates 
and RAM capacity). But this approach has the disadvantage 
that lookup tables configured as RAM consume a fairly large 
amount of silicon chip area. 

15 One prior art approach to solve this shortcoming of 
existing FPGAs has been to connect multiple FPGAs exter- 
nally. However, because of the limited number of IOBs 
available to connect FPGAs, not all circuits can be imple- 
mented by this approach. Moreover, using more than one 

2Q FPGA increases the power consumption, cost, and space 
required to implement a circuit. Therefore, the multi-device 
approach provides only a partial solution. 

Another method used in the industry to address this 
challenge is increasing the quantity of logic and interconnect 

25 resources within FPGAs. However, for any given fabrication 
technology, there will be limitations to the number of CLBs 
that can be fabricated with the necessary interconnect on a 
single FPGA part Thus, there continues to be a need, from 
an architectural standpoint to increase the functional capac- 

30 ity of FPGAs. 

One solution currently under development in the industry 
is the incorporation of course- grained block RAM Into the 
FPGA architecture. Coarse-grained RAM may be broadly 
defined as a memory block having the capacity to store more 

35 than several lookup tables' worth of memory elements, or 
bits, contrasted with fine-grained RAM, such as memory- 
configured lookup tables, which can be configured as very 
small blocks as fine as 16x1 bits and smaller. Coarse-grained 
block RAM can also be distinguished from lookup tables 

40 configured as memory by higher densities and faster access 
times. Also, coarse-grained block RAM provides a more 
predictable delay than smaller lookup table-based memory 
blocks because lookup table-based memory blocks are 
dependent upon configured decoders for functionally linking 

45 smaller elements into blocks. Such configured decoders add 
an element of uncertainty to circuit timing and create 
significant delays which varies with the size of the linked 
block. In contrast coarse-grained RAM blocks have built-in 
decoders with predictable timing characteristics. 

so Moreover, routing challenges created by the use of con- 
figurable memory cells are eliminated by the use of course- 
grained block RAM. The large capacity per area of coarse- 
grained block RAM enables designers to implement 
complex functions in one logic level without the routing 

55 delays associated with linked CLBs in multi-level functions. 
For example, a single block RAM could accommodate a 
large multiplier, control logic for a state machine, coeffi- 
cients for a digital signal filer, or any other desired structure 
compatible with the available RAM configurations. Also, 

60 logic functions can be implemented in coarse-grained RAM 
by programming the RAM with a read-only pattern, creating 
a large Read Only Memory (ROM). Coarse-grained RAM 
and other types of quickly accessible memory in block form 
are thus highly compatible with FPGA architectures and 

6s significantly increase FPGA functionality. 

While adding memory block types other than lookup 
tables configured as memory can help increase device 
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density, problems remain. A common default initialized state ing to CLB 14. CLB 14 includes input terminals for receiv- 

for available block RAM within programmable devices is an ing logic function data and configuration data, as well as 

"all zero" state. However, using block RAM as anything output terminals for forwarding logic function data, 

other than read/write memory, such as circuitry for imple- Configurable routing matrix 18 includes configuration 

menting complex logic functions or on-board ROM, 5 input tenninals for receiving configuration data and, in 

requires a non-zero state immediately after device conflgu- response to receipt of the configuration data, controlling the 

ration. Thus* it is desirable for block RAM on an FPGA to coupling of the CLB data input and output terminals, 

default to an all-zero state, but be capable of starting at a According to the data stored in configuration memory 16, 

non-zero initialization state as well. This challenge is configurable routing matrix IS selectively connects the 

addressed by the present invention. io external pins of the FPGA 10 to various CLBs 14, and 

selectively interconnects predetermined CLBs 14. 

SUMMARY OF THE INVENTION Configuration data for each CLB 14 and each config- 

The present invention provides a method (and apparatus) urabie routing matrix 18 of a tile 12 are stored in the 

for loading data into a random access memory (RAM) block corresponding configuration memory 16. The configuration 

within a programmable logic device and comprises the steps 15 data are loaded into the configuration memory 16 from 

of configuring the programmable logic device as a first outside the FPGA 10, and may be loaded through a shift 

circuit capable of loading the data into the RAM block, register having at least one external pin. Such a loading 

loading the data into the RAM block, and configuring the means is discussed by Freeman in U.S. Fat No. Re 34363, 

programmable logic device as a second circuit which utilizes incorporated herein by reference. The output tenninals of 

the data loaded into the RAM block. Preferably, the con- 20 each configuration memory 16 are coupled to their respec- 

figuration information for the first circuit, the data, and five CLB 14 and configurable routing matrix 18. 

configuration information for the second circuit are com- FIG. 2 illustrates an FPGA having coarse-grained RAM 

bined to form a single data stream which is input to the blocks 100 located at points within the logic array for easy 

programmable logic device. access and interconnection with selected CLBs 14. Also. 

In the preferred embodiment, the memory block is com- 25 each RAM block 100 has access to local interconnect for 

prised of a plurality of memory elements, the programmable interconnection with other RAM blocks in the associated 

logic device is a field prograrnmable gate array (FPGA) or vertical column to form enlarged RAM arrays. FIG. 3 

a partially reconfigurable field prograrnmable gate array provides a closer view of the FPGA having coarse-grained 

(FPGA). Input lines to the programmable logic device ^ RAM blocks 100. 

function as input lines for loading the data through the first While the inclusion of RAM in an FPGA or other PLD in 

circuit the manner shown in FIGS. 2 & 3 is useful, a significant 

In the preferred embodiment the data is received in a portion of valuable chip area must be dedicated to a structure 
register means in the first circuit and forwarded to the RAM f <* routing data to the RAM. This dedicated loading struc- 
Wock. The data is written to a predetermined position within 35 ture represents wasted space in user's designs requiring an 
the RAM block using an addressing means. Both the register initial all-zero state for the on-chip RAM, since coarse- 
means and the addressing means are controlled. grained block RAM could, for these designs, be set to 

automatically achieve such as all-zero state without a dedi- 

BRIHF DESCRIPTION OF THE DRAWINGS cated loading structure. 

The aforementioned advantages of the present invention 40 ™ c present invention '^^ste^to hardwire a 

as well as additional advantages thereof will be more clearly structurc f <* l^Ron^hip RAM. Specifically, *e txescnt 
understood hereinafter as a mult of a detailed description of tendon exploits the avaUabdiry of a conligmble envi- 

a preferred embodiment of the invention when taken in ronment in an FPGA wh^, g^yen the proper bitstrc^ fox 

conjunction with the following drawings in which; design configuration, the FPGAiuelf can function as a RAM 

' „ , , - . ,i . . ... , . A * loader, thereby conserving valuable on-chip resources while 

J7? -lisablo^grams 45 allowing for a noc-zero initialized on^hip RAM state. 

FPGA with which the present invention may loused; ^because on-board RAM can be selectively inter- 

FIG. 2 is a block diagram showing an FPGA having to represent different functional block shapes 

coarse-grained, on-board RAM, said FPGA being compat- (fcg 256x8-bit, 512x4-bh% l ( 024x2-bit, 2,048xl-bit), the 

ible with the method of the present invention; ^ inventive RAM loader design configuration can be opti- 

FIG. 3 illustrates a closer view of the FPGA of FIG. 2; mized for a RAM layout optimized for data loading, while 

FIG. 4 is a block diagram of a circuit design for loading the user's subsequently loaded FPGA design can include an 

RAM within an FPGA. entirely different RAM configuration optimized for the 

user's design. 

DETAILED DESCRIPTION OF THE DRAWINGS M ^ mcrtfore mdudes a method and 

FIG. 1 illustrates an FPGA 10 comprising a plurality of system for loading data into FPGA on-board memory 8 true- 
dies 12. each tile 12 including a configurable logic block tuxes (e.g.. course-grain blocks of internal, dedicated RAM) 
(CLB) 14, a configuration memory 16, and a configurable to enable utilization of such structures immediately upon 
routing matrix 18. The routing matrices IS connect to each boot up, without waiting for post-programming 
other and thus interconnect the CLBS. The routing matrices «o initialization, and in a manner which exploits the flexibility 
also connect to external pins via lines 44. of FPGA structure. In a broad sense, the method of the 

Each tile 12 implements a portion of a user's circuit The present invention includes three steps: configuring a pro- 

10 logic function of each tile 12 is carried out by its grammable device as a RAM loader circuit, loading the 

respective CLB 14. Each CLB 14, and thus each tile 12, is on-board RAM, and reconfiguring the device as desired, 

capable of performing a plurality of different logic functions. 65 A preferred einr>odiment of the RAM loading circuit 

The logic function performed by a Cl T* 14 is determined by design of the present invention is illustrated in FIG. 4. RAM 

the data stored in the configuration memory 16 correspond- loader circuit design 150 is compatible with any program- 
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raablc logic device configuration loading mode desired by external counter 108 resets to 2 13 to address the bit following 

die user. For example, a 10 device could load the RAM the program data to be loaded from external ROM or flash 

loader circuit design under its own clock (master mode), as memory. Done? block 106 signals that the RAM is full once 

a peripheral device, as a slave to another device, or within the external counter hits 2 13 +2 12 (the total number of 

a chain of serially loading devices (a daisy chain). Each of 5 parallel-loaded bytes including loader circuit 150 coofigu- 

thesc loading modes is described in The Programmable ration data (for example) and RAM data to be loaded). 

Logic Data Book at pages 2-7 to 2-46. cited above. The Multiplexer 116 routes data directly from D /yv . Internal 

loading mode desired by the user is set via a 3-bit signal sent counter 102 and RAM block write enable terminal W.E. are 

to the mode input terminal of control logic block 114 in FIG. clocked directly from signal CLK via multiplexer 118. In 

4. io another embodiment, external counter could be set to dec- 

The circuit design of FIG. 4 includes a number of bus- rement rather than increment and Done? block 106 is set to 

width references for various input lines, output lines and trigger when external counter 108 reaches 2 i8 -2 16 -2 l5 -l. 

connections between blocks. These references are provided rather than 2 16 +2 15 . 

only for example and are not intended to limit the scope of In slave serial configuration mode, external counter 108 

the present invention to any particular bus-width or FPGA 15 functions only to count the number of bits read. Signal 

part constraint The numbers shown in RAM loader circuit FULL from Done? block 106 is no longer required off-chip, 

design 150 correspond to an FPGA part with 2 16 bits of As in other configuration modes, internal counter 102 steps 

design data and 2 a bytes (2 15 bits) of course-grained RAM through the on-chip RAM addresses. Shift registers 104 and 

space. 110 serve to parallelize the input data. 

Also, for the design of FIG. 4. it is assumed that the 20 In peripheral mode, wherein the FPGA shares data bus 

course-grained RAM on the programmed FPGA part can be access with other devices, control logic block 114 responds 

configured to simulate a single block (represented by block to the Busy/Ready input signal. Specifically, the Busy signal 

101) with a continuous address space of. for example. 2 U can be used as a handshake signal to start and stop the 

8-bit elements, even though a number of RAM blocks are loading of RAM block 100 in response to use of the data bus 

actually distributed throughout the FPGA. Once the block is 25 by other devices. 

loaded with the desired data for circuit initiation, the RAM ram loader design 150 also functions with daisy 

block can then be reconfigured for any layout desired, (e.g.. chained FPGAs parts in any compatible configuration mode, 

a 512 element first-w. first-out memory stack (FIFO), or two Moreover, when FPGAs are daisy chained, RAM loader 

2kx8-bit ROM). For a device wherein the RAM size or _ circuit design 150 can be configured in all of the FPGAs 

layout is different, design 150 may be altered accordingly in M simultaneously, since the design would be identical for every 

a manner known to those skilled in the art to which the identical FPGA in the chain. In the circuit of FIG. 4, gate 122 

present invention pertains. functions as an output enable for daisy chained devices. Gate 

Control Logic Block 114 tracks the desired configuration 120 functions to disable RAM block ltO in daisy chain 

mode and. if necessary, ensures compliance with any mode when data destined for devices further down the chain 

required handshaking protocol. Control logic output signals pass through circuit 150. 

A through Z are fed to a number of other blocks in design Another advantage of RAM loader circuit design 150 is 

150 to set constant values, enable clocks, reset values and th c usc 0 f the same address I/O pins for sending data to 

select inputs to multiplexers. Labeled signals A through D ram block 1W as those used for configuring^ circuit, 

exemplify fte function and connectivity of all other control thereby allowing the use of a continuous bitstream including 

logic signals E through Z, not shown. mc loadcr configuration and RAM data, as well as the 

When configuring in master mode with serial data input, ultimately desired FPGA design configuration bit stream, 

external counter 108 resets to 2 16 , i.e. the address of the bit The ability to interchange I/O and address pins allows this 

following the end of the program data used to create loader single bitstream to flow onto the FPGA from an off-chip 

design 150. Done? block 106 includes a comparator which 45 memory device without intervention by the user. In fact the 

signals via output 107 that RAM block 100 is full once RAM loader circuit design 150 can be added to thc bitstream 

External Counter 108 reaches 2 16 +2 15 (3x2 15 ) (the total corresponding to the user's desired RAM data and circuit 

number of bits in the bitstream loaded up to that point design without the user's knowledge, thereby allowing the 

including loader design configuration data plus dedicated manufacturer the opportunity to maintain a competitive 

RAM data). Multiplexer 116 routes data from shift register w advantage through the use of an improved loader circuit, the 

110 to shift register 104 when the design is loading data in exact layout of which remains unknown to thc user or 

serial mode. competitor. Moreover, because configuration code can be 

In parallel mode, control logic block 114 signals multi- automatically added to a user's configuration bitstream. 

plexer 116 to forward 8-bit data from D /v directly to shift simple software upgrades can allow the manufacturer to 

register 104 (Le. bypassing shift register 110). Shift register 55 improve the utilized RAM loader circuit in a backward 

110 functions to parallelize the serial data input T) INO into a compatible fashion for all devices in the field or customized 

byte-wide form for feeding into RAM block 100, configured for all available devices without inconveniencing the user, 

for byte-wide data loading. Internal counter 102 and the While the present invention has been described with 

RAM block write enable tenninal W.E. are clocked from reference to certain preferred embodiments, those skilled in 

clockdivider 112 (in this embediment. an 8 to 1 divider) to w th c art to which the present invention pertains will now. as 

allow sufficient time for shift register 110 to buffer a full byte a result of the appUcant's teachings herein, recognize that 

before the RAM block address tracked by internal counter various modifications and other embodiments may be pro- 

102 is incremented. Shift register 104 can be clocked from vided. By way of example, the precise loader design and 

either the standard clock signal CLK or 8 to 1 dock divider configuration or data bit sequence may be modified while 

112, depending upon the configuration mode selected. w preserving the advantages of the coarse-grained block RAM 

Id master parallel mode, wherein the FPGA drives its own loading and non-zero initialization ability of the present 

clock and outputs address destinations to the data source, invention. Also* thc preferred embodiment of the present 
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invention is also compatible with partially configurable 
FPGAs. These and other variations upon and modifications 
to the embodiment described herein are deemed to be within 
the scope of the invention which is to be limited only by the 
following claims. 5 
What is claimed is: 

1 . A method of loading data Into a random access memory 
(RAM) block within a programmable logic device, said 
method comprising the steps of: 

configuring the programmable logic device as a first 10 

circuit capable of loading the data into the RAM block; 
loading the data into the RAM block; and 
configuring the programmable logic device as a second 
circuit which utilizes the data loaded into the RAM 1$ 
block. 

2. The method of claim 1. wherein configuration infor- 
mation for said first circuit, the data, and configuration 
information for said second circuit are combined to form a 
single data stream which is input to the programmable logic 2Q 
device. 

3. The method of claim 1. wherein the memory block is 
comprised of a plurality of memory elements. 

4. The method of claim 1. wherein the programmable 
logic device is a field programmable gate array (FPGA). M 

5. The method of claim 1. wherein the programmable 
logic device is a partially reconfigurable field programmable 
gate array (FPGA). 

4>. The method of claim 1. wherein input lines to the 
programmable logic device function as input lines for load- M 
ing said data through said first circuit 

7. The method of claim 1, wherein the method further 
comprises the steps of: 

receiving said data in a resister means in the first circuit 
and forwarding said data to said RAM block; 35 

writing said data to a predetermined position within said 
RAM block using an addressing means; 

controlling said register means and said addressing 
means. 

8. A coniputer-implemented system for loading data into 40 
a volatile memory block located within a programmable 
logic device* the system comprising: 

means for forwarding a first configuration bitstream from 
a data storage unit to the programmable logic device to 



8 

thereby initially configure the programmable logic 
device as a first circuit capable of loading the data into 
the volatile memory block; 
means for supplying the data from the data storage unit to 
the programmable logic device configured as the first 
circuit for loading the data into the volatile memory 
block; and 

means for forwarding a second configuration bitstream 
from said data storage unit to the programmable logic 
device to thereby reconfigure the programmable logic 
device as a second circuit 

9. The computer-implemented system of claim 8. wherein 
said second circuit utilizes said data stored in the volatile 
memory block. 

10. The computer-implemented system of claim 8, 
wherein said first and said second configuration bitstreams 
are combined with said data to form a consolidated bit- 
stream. 

11. The coinputer-implemented system of claim 8. 
wherein the volatile memory block is comprised of a plu- 
rality of dedicated memory elements. 

12. The computer-implemented system of claim 8. 
wherein the programmable logic device is a field program- 
mable gate array (FPGA). 

13. The computer-implemented system of claim 8, 
wherein the programmable logic device is a partially recon- 
figurable field programmable gate array (FPGA). 

14. The computer-implemented system of claim 8. 
wherein input lines to the programmable logic device func- 
tion as input lines for loading said data through said first 
circuit 

15. The computer-implemented system of claim 8, 
wherein said first circuit comprises: 

register means for receiving said data and forwarding said 

data to said memory; 
addressing means for writing said data to a predetermined 

position within said memory; and 
control means for controlling said register means and said 

addressing means. 

***** 
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